Lapack Working Note 56 Conjugate Gradient Algorithms with Reduced Synchronization Overhead on Distributed Memory Multiprocessors
نویسندگان
چکیده
The standard formulation of the conjugate gradient algorithm involves two inner product computations. The results of these two inner products are needed to update the search direction and the computed solution. Since these inner products are mutually interdependent, in a distributed memory parallel environment their computation and subsequent distribution requires two separate communication and synchronization phases. In this paper, we present three related mathematically equivalent rearrangements of the standard algorithm that reduce the number of communication phases. We present empirical evidence that two of these rearrangements are numerically stable. This claim is further substantiated by a proof that one of the empirically stable rearrangements arises naturally in the symmetric Lanczos method for linear systems, which is equivalent to the conjugate gradient method.
منابع مشابه
LAPACK Working Note ? LAPACK Block Factorization Algorithms on the Intel iPSC / 860 ∗
The aim of this project is to implement the basic factorization routines for solving linear systems of equations and least squares problems from LAPACK—namely, the blocked versions of LU with partial pivoting, QR, and Cholesky on a distributed-memory machine. We discuss our implementation of each of the algorithms and the results we obtained using varying orders of matrices and blocksizes.
متن کاملLAPACK working note 51 Qualitative Properties of the Conjugate Gradient and Lanczos Methods in a Matrix Framework
This paper presents the conjugate gradient and Lanczos methods in a matrix framework, focusing mostly on orthogonality properties of the various vector sequences generated. Various aspects of the methods, such as choice of inner product, preconditioning, and relations to other iterative methods will be considered. Minimization properties of the methods and the fact that they can compute success...
متن کاملPerformance of Parallel Branch and Bound Algorithms on the KSR1 Multiprocessor
In this paper we consider the parallelization of the branch and bound (BB) algorithm with best-rst search strategy on the KSR1 shared-memory mul-tiprocessor. Two shared-memory parallel BB algorithms are implemented on a 56-processor system. Measurements indicate that the scalability of the two algorithms is limited by the cost of interprocessor communications and by the cost of synchronization....
متن کاملLAPACK Working Note 58 The Design of Linear Algebra Libraries for High Performance Computers
This paper discusses the design of linear algebra libraries for high performance computers. Particular emphasis is placed on the development of scalable algorithms for MIMD distributed memory concurrent computers. A brief description of the EISPACK, LINPACK, and LAPACK libraries is given, followed by an outline of ScaLAPACK, which is a distributed memory version of LAPACK currently under develo...
متن کاملLapack: Linear Algebra Software for Supercomputers 1
This paper presents an overview of the LAPACK library, a portable, public-domain library to solve the most common linear algebra problems. This library provides a uniformly designed set of sub-routines for solving systems of simultaneous linear equations, least-squares problems, and eigenvalue problems for dense and banded matrices. We elaborate on the design methodologies incorporated to make ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999